$\DeclareMathOperator{\vec}{vec}$
Free-action is the time-integral of free-energy
Now let our parameters factorise into states, parameters, and hyperparameters, $ \psi \to \{u(t), \theta, \lambda\}$,
and let the approximate densities over these variables follow mean-field assumption $q(\psi) = q(u, t)q(\theta)q(\lambda)$.
$\theta$ parameterises the first moment of the states, and is independent of $\lambda$, which parameterises the second moment.
Write
Let
Naturally,
For the internal action, $\overline U$, find its second-order truncation around its mode $\mu = \left[\mu_u, \mu_\theta, \mu_\lambda\right]^T$ (ignoring bilinear terms).
$$ \DeclareMathOperator{\tr}{tr} \begin{eqnarray*} \overline U &=& \int L(\mu, t) \\ &&+ \left\langle \frac 1 2\left[ (u - \mu_u)^T L^{(uu)} (u - \mu_u) \right.\right.\\ && +\left.\left. (\theta - \mu_\theta)^T L^{(\theta\theta)} (\theta - \mu_\theta) \right.\right.\\ && +\left.\left. (\lambda - \mu_\lambda)^T L^{(\lambda\lambda)} (\lambda - \mu_\lambda) \right] \right\rangle_{q_u q_\theta q_\lambda}dt\\ &=& \int L(\mu, t) + \tr\left(\Sigma_u L^{(uu)}\right) + \tr\left(\Sigma_\theta L^{(\theta\theta)}\right) + \tr\left(\Sigma_\lambda L^{(\lambda\lambda)}\right)dt \end{eqnarray*} $$Solve for $\partial\overline F/\partial{\Sigma_u(t)}=0$
$$ \begin{eqnarray*} &&\frac 1 2 \int dt L^{(uu)} - \frac 1 2 \int dt\Sigma^{-1}_u = 0\\ \Rightarrow && \Sigma^{-1}_u(t) = - L^{(uu)}(\mu, t) \end{eqnarray*} $$Similarly,
$$ \begin{eqnarray*} \Sigma^{-1}_\theta &=& -\int dt L^{(\theta\theta)}(\mu, t)\\ \Sigma^{-1}_\lambda &=& -\int dt L^{(\lambda\lambda)}(\mu, t) \end{eqnarray*} $$Note that
$$ \begin{eqnarray*} L(u, t, \theta, \lambda) &=& L(u, t|\theta, \lambda) + L(\theta) + L(\lambda)\\ L^{(uu)} &=& L^{(uu)}(u, t| \theta, \lambda)\\ L^{(\theta\theta)} &=& L^{(\theta\theta)}(u, t|\theta, \lambda) + L^{(\theta\theta)}(\theta)\\ L^{(\lambda\lambda)} &=& L^{(\lambda\lambda)}(u, t|\theta, \lambda) + L^{(\lambda\lambda)}(\lambda) \end{eqnarray*} $$With the following notation, one can write down the variational action, which is the internal action expected under their resepctive Markov Blanket.
And the following differentials on variational actions will become useful later.
Note that the notation, $A\!\!:$, stands for matrix vectorisation, e.g., $L\!\!:_\theta^{(\theta\theta)}$ is a vectorisation of $L_\theta^{(\theta\theta)}$
which becomes
This is to be contrasted with, say, $L\!\!:_\theta^{(\theta\theta)(u)} and L\!\!:_\theta^{(\theta\theta)(uu)}$, which read
respectively.
Adopting these notations, the differentials of variational actions are
Suppose the time-dependent state, $u$, subsumes its motion up to arbitrary high order, one may unpack this and write $\tilde u = (u, u', u'', \dots)^T$.
Let this generalised state move along the gradient of variational energy/action, hoping to catch up the motion one level above when the gradient vanishes:
This way, when $V_u^{(u)} = 0$ (this happens at the mode where $\tilde u = \tilde\mu$), one has
Thus, motion of the modes becomes modes of the motion. Here, $\mathcal D$ is a differential operator, or simply a delay matrix.
Let us find the linearisation of this state motion around its mode, $\tilde\mu$, which follows that
And have $\tilde\varepsilon = \tilde u - \tilde\mu$, so that $\dot{\tilde\varepsilon} = \dot{\tilde u} - \dot{\tilde\mu} = \dot{\tilde u} - \mathcal D\tilde\mu$.
With substitution, write
and note $\mathcal J = \left( V_u^{(uu)} + \mathcal D\right) = \partial\dot{\tilde u}/\partial\tilde u$.
The updating scheme is again derived from Ozaki's local linearisation:
For parameters and hyperparameters, this reduces to
where $u=(v, x)^T$ and let $p(v)$ be uninformative for now. And let the parameter, $\theta$, and hyperparameter, $\lambda$, be independent and take the following form
which altogether lend the generative density to an analytical form
where
If one lumps the time-dependent terms together:
one writes
And let this expression prescribe generalised motion over its states, write
For state $u$, the conditional precision, $\Lambda_u=-L(t)^{(uu)}$, is
where
Thus,
Conditional precision over parameter, $\Lambda_\theta = -L(t)^{(\theta\theta)}$ is
Let $\theta = (\theta_{:1}, \theta_{:2}, \dots, \theta_{:k}, \dots, \theta_{:K})^T$.
Conditional precision over hyperparameter, $\Lambda_\lambda = -L(t)^{(\lambda\lambda)}$, is, assuming $\lambda_i, \lambda_j\in\lambda$
where
See Differentials of determinant , Differentials of inverses and trace from Matrix Reference Manual.
Before calling upon Ozaki's scheme, one recalls that the observation, which affects the variational energy as well, has to be considered:
Thus,